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Abstract 

We analyze safety problems of complex systems using the methods of 
mathematical statistics for testing the output variables of a code simulating 
the operation of the system under consideration when the input variables 
are uncertain. We have defined a black box model of the code and derived 
formulas to calculate the number of runs needed for a given confidence level 
to achieve a preassigned measure of safety. In order to show the capabilities of 
different statistical methods, firstly we have investigated one output variable 
with unknown and known distribution functions. The general conclusion 
has been that the different methods do not bring about large differences in 
the number of runs needed to ensure a given level of safety. Analyzing the 
case of several statistically dependent output variables we have arrived at 
the conclusion that the testing of the variables separately may lead to false, 
safety related decisions with unforseen consequences. We have advised two 
methods: the sign test and the tolerance interval methods for testing more 
then one mutually dependent output variables. 
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1 Introduction 



There are two approaches to safety analysis of large complex systems. Since 
the analysis has to demonstrate safety of the operation under the investigated 
circumstances, we may scrutinize a not too realistic but rather unfavorable 
situation saying that if that situation is safe then any real situation must be 
on the safe side. This approach we call conservatism. 

An alternative approach may attempt to investigate the real situation 
and show that no limit violation can occur. In this case the calculated values 
should be increased by the possible error when compared with the safety 
limit PP . That approach is called best estimate which is not a very fortunate 
but generally accepted name. 

In conservative analysis, the first problem is in the selection of the case 
to be studied. It identifies an overt attempt to bound the actual expected 
state hence it should estimate also the consequences of model uncertainties. 
How do we know if a given situation is more conservative than is the other? 
It is often impossible to foresee the outcome of a non-linear process. Another 
problem may be the interplay between approximations. It may happen that 
either of two approximations leads to conservatism but their simultaneous 
presence does not. The conservative approach has been prevalent for a long 
time, although today rather the best estimate methods are in the focus. 

The main difficulty with best estimate calculation is in the complexity of 
the phenomena involved. (A new material phase may appear, at a given tem- 
perature chemical reactions may take place producing new material proper- 
ties, and also producing or removing heat, the process dynamics is nonlinear 
etc.) In spite of the problem's complexity, a best estimate method attempts 
to solve the equations describing the involved physical processes as accurately 
as our knowledge permits. From licensing viewpoint, several key parameters 
should be selected and compared to the acceptance criteria. 

Best estimate methods are accompanied by an uncertainty analysis to 
learn the uncertainty band of the response The purpose of the uncertainly 
evaluation is to provide assurance that the selected parameters at least with 
probability 95% or more will be in the acceptance region or will not exceed 
their acceptance level. 

The present work is dealing with the code uncertainty only, which is rather 
important constituent of the total uncertainty. We assume the modeled pro- 
cedure to start from a known initial state. All the physical quantities in the 
model we sort as input, output, and latent data. By definition, a datum is 
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input if its domain is known along with a distribution function associating 
a probability with any admitted value. In a model, there are several con- 
stants, which are considered either as input or latent data. Input, when the 
given constant is looked upon as a variable in a given range and a probabil- 
ity is allotted to every possible value. Distinction between input and latent 
data is a matter of engineering judgement. The nature of the distribution 
may depend on the determination of the constant. Latent, when we refrain 
from analyzing the uncertainties of the constant, temporarily we take it as a 
fixed number. A datum not falling into the input or latent category is called 
output. 

The paper is organized as follows. In Section 2 we define a simple black 
box model linking the output variables to input variables, while in Section 3 
we analyze possibilities and limitations of several well-known statistical tests 
for one output variable with unknown and known cumulative distribution 
function. Special attention is paid to the application of a slightly new variant 
of the tolerance interval method. In Section 4 we deal with the case of several 
not independent output variables by using the advantages of order statistics, 
and, finally the conclusions are summarized in Sections 5 and 6. 

The present work focuses on deriving criteria for safe operation when the 
output variables are fluctuating as a result of randomness of input variables, 
and intends to give some help in practical applications. In the sequel we fol- 
low the notation used in the classical handbook of statistics by M.G. Kendall 
and A. Stuart 0. 



Let us consider a system as complex as a nuclear power plant, or an oil 
refinery plant for instance. Assume we have a model describing that system, 
and that model enables us to calculate physical parameters characterizing 
the system at arbitrary instant t. Let n be the number of technologically 
important variables. In the frame of the model, the operation of the system 
is considered safe if all calculated variables belong to a given set of intervals 



determined by the technology. 

In order not to be set back by the complexity of the problem, we sug- 
gest a simple black box model, in which output variables are linked to input 
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variables. That link can be a computer code that transforms vector x G A", 
the input variables, into a vector y{t) G 3^, the output variables. Here X 
and y are sets of all possible values of x and y{t), respectively. In general, 
the dimension of x, i.e. the number of input variables is not the same as 
the dimension of y{t), i.e. the number of output variables. Every data that 
enters into the model is treated as an input variable, hence we do not dis- 
tinguish parameters. The model is an explicit relationship between input x 
and output y: 

yit) <= Cit)x, (1) 
where C{t) is a nonlinear operator that maps 



X 



X2 
\XH J 



into 



m 



( yiit) \ 
y2it) 



In practical cases the link between input and output is very complex hence 
there is no reason to anticipate an analytical relationship like y{t) = f{x,t). 
In the sequel C{t) is assumed to be deterministic, in other words once the 
input has been fixed, we obtain the same output within the computation 
accuracy for each run. At the same time, if the input vector fluctuates ac- 
cording to distribution laws simulating possible variations of the technology, 
or, reflecting uncertainty of some parameters of the model then the output 
parameters also fluctuate in repeated runs. 

We present an illustration of how random input may influence an output 
variable, see Fig. 1. We used the thermohydraulic code ATHLET ||4, to 
generate several output variables for a simple experimental setup, but in 
Fig. 1 we presented only one output variable as function of time for three 
independent runs. It is obvious that in this case the above given criterion 
for safe operation of the system needs to be changed because there is no 
guarantee that a new run after a successful run will also be successful. 

We call a state Xq nominal, if all the input parameters take their respective 
expectation value, i.e. Xq = E{x}. We can perform a calculation in the 
nominal state to get the corresponding output yo = C(t)xQ. Usually the 
state xo is called safe if yo is in the safety envelope Vt- However, we need a 
more stringent definition: state x is called safe if y is in the safety envelope 
Vt for every x E X. 
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Fi gure 1: Influence of the random input on the time dependence of one output variable 
in three independent runs. 



Here we should make three remarks, (i) X may be an infinite interval 
when at least one of the input variables is of normal distribution. In practi- 
cal calculations such variables are confined to a finite interval by engineering 
judgement, (ii) We check that statement by a given, finite number of calcula- 
tions ^,5j with input from X. If there is a value outside the safety envelop Vt 
the state x is unsafe independently of the fact that the nominal state xq may 
be safe, (iii) Even if every calculated output is safe, there is a probability 
that the state is actually unsafe. 

Fixing time t after N runs we obtain randomly varying output vectors 
{yiit),y2{t), ■ ■ ■ ,yN{t)} which carries information on the fiuctuating input 
and the code properties. In the next Section we are considering only one 
output variable with continuous cumulative distribution function G{y) = 
I-oD time t is taken as fixed and its notation is omitted. 



3 One output variable 
3.1 Old Bayesian method 

If we carry out runs with fiuctuating input, then we obtain a sample 
Sn = {yi, y2, ■ ■ ■ , Vn} of the random variable ?/ at a fixed time point. Through 
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technological considerations, we define a fix acceptance and a fix rejection 
interval to variable y. Let the acceptance interval be Ha = [Lt,Ut], and 
Hr = [Lt, Ut] = {—oo, Lt) U {Ut, +oo) the rejection interval. ^ 
The probability 

V{y G Tia] = / g{u) du = w 

JHa 

that an observed y lays in the acceptance interval is not known. Knowing 
however that k elements of the sample Sn are in the acceptance interval Tia, 
then utilizing Bayes' theorem, without knowing g{u), we can claim that 

l3{uj\N, k) = 



Jul 



m'^ (1 - du ^ V J 



[l-uy 00^+^-^ = p{u\N,k) (2) 



is the probability that the unknown acceptance probability w is greater than 
a prescribed uj. The proof of the mentioned theorem is available in textbooks^ 
hence we omit it here. We wish to point out the expression 

f3{uj\N,0) = l-uj''+\ (3) 

which shows convincingly that even when the whole sample iS^v consists of 
elements to be accepted, we can state only that w > u with probability 
1 — cu^^^. If one element in the sample Sn is in the rejection interval, then 
we have 

(3{u\N, 1) = 1 - uj^+^ - (iV + 1) (1 - uj)uj^. (4) 

Using (0), one can easily determine the allowed number of rejections in a 
sample of elements so that the unknown probability of the acceptance w to 
be larger than the prescribed limit u with a given probability f3{u!\N,k) > a. 
It can be agreed on that a system is safe if it is almost certain (0 << a < 1) 
that the unknown probability of the acceptance w is larger than a prescribed 



^In many practically important cases Lt = — c», and so Ha = (— oo, Ut] and 7i,. = 
([/t,+oo). 

^Pal. L.: Fundamentals of probability Theory and Statistics, vol. I. -II., 109-113, 
Budapest, Akademiai Kiado, Budapest (1995), in Hungarian. 
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Table I. Number of observations N at which w > co with probability (3{oj\N, k) > a 
for several values of a, oj, and the number of rejected values N — k. 



a 


UJ 


N-k = Q 


N-k = l 


N-k = 2 




0.90 


21 


31 


51 


0.90 


0.95 


44 


75 


104 




0.99 


228 


387 


530 




0.90 


27 


45 


60 


0.95 


0.95 


57 


92 


123 




0.99 


297 


472 


626 




0.90 


43 


63 


80 


0.99 


0.95 


89 


129 


164 




0.99 


457 


660 


836 



For example, we read out from Table I that if all the 297 observed values 
were acceptable, i.e. there was not a single value to be rejected, then, larger 
than 95% is the probability that w > 0.99, i.e. the proportion of rejected 
observations in any sample will be not larger than 0.01. The more observa- 
tions we have, with the higher probability we can state that the investigated 
system is safe, and the higher is the lower level u for the unknown acceptance 
probability w. 

3.2 Distribution free confidence interval for quantile 

Assume again the cumulative distribution function G{y) of the output vari- 
able y to be unknown but continuous and strictly increasing. Denote by 
the 7-quantile of G{y), i.e the value satisfying the equation 

GiQ,) = dG{y) = 7- 

J — oo 

Clearly, the interval (— oo, Q^] covers the proportion 7 of the distribution 
G{y). Since G{y) is continuous and strictly increasing ^ one can write 

Q, = G-\^). 

It is to mention that the point estimate of is that element of the ordered 
sample the index k of which is the nearest integer to A^7. 

^If G{y) is a continuous and not decreasing function, then Q-^ = 'mi{y : G{y) > 7}. 



7 



3.2.1 Two-tailed test 

Carrying out N independent runs, we get a sample Sn — {yi, ■ ■ ■ , Vn}- Ar- 
range the sample elements in increasing order, ^ and denote by y{k) the kth 
of ordered elements; hence we have 

y{l) < y{2) <■■■< y{r) <■■■< y{s) <■■■< y{N), 

and by definition y{0) = — oo, while y{N + 1) = +00. As known the joint 
density function of random variables 

z{r) = G[y{r)] and z{s) = G[y{s)], 

where r and s > r are positive integers from {1,2,..., N} is given by 

_ M'-l {V - uy-'--'^ (1 - V)^~' 

- B{r,s-r) B{s,N-s + l) ' 
0<u<v <1. 
Here B{j, k) is the Euler beta function. 

Theorem 1 If r and s positive integers satisfying the inequality < r < 
{N + 1)7 < s < N, then the random interval [y{r),y{s)] covers the unknown 
^-quantile with probability 

l3^V{y{r)<Q^<y{s)}^ 



= 7(1 - 7, - s + 1, s) - 7(1 - 7, - r + 1, r) (5) 

where 

is the regularized incomplete beta function for non-singular cases. 

The proof of the theorem is simple and it can be find in the Appendix I. 
One can see that the confidence level f3 for the the random interval [y{r),y{s)] 
does not depend on G{y), in other words, the confidence interval for the 
unknown is distribution free. 

Clearly, there are many different confidence intervals covering with a 
prescribed probability (3. We have to chose the shortest interval by using the 
following procedure: 

"^The probability that equal values occur is zero. 
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• from the ordered sample determine the integer q = [(N + 1)7] due to 
the point estimate Q-y — y{q) of the 7-quantile Q^, 

• calculate the confidence level P step by step for intervals defined by 

integer pairs [rj, Sk] where Vj = q — j, j = 1,2, ... q — 1 and = 
q + A;, A; = 1, 2, . . . , — k, respectively, until the prescribed value of 
(3 is reached provided that it is possible at the sample size N that we 

have, 

• if the prescribed f3 could not be reached, then the sample size should 
have been increased. 

When the confidence interval [y{rj),y{sk)] covering the ^-quantile of the 
unknown distribution G{y) at prescribed confidence level (3 is a part of the 
interval [Lt, Ut] defined by technology, then the system can be qualified safe 
at level (/3|7). 

Table II. Confidence levels for confidence intervals covering the unknown quantile 
Q0.9 in the case of sample size N = 100. (The point estimate of Q0.9 is equal to Qo.g = 
2/(90)). 



r\s 


95 


96 


97 


98 


99 


100 


89 


0.6455 


0.6793 


0.6952 


0.7011 


0.7027 


0.7030 


88 


0.7442 


0.7781 


0.7940 


0.7999 


0.8015 


0.8018 


87 


0.8185 


0.8524 


0.8683 


0.8742 


0.8758 


0.8761 


86 


0.8699 


0.9037 


0.9196 


0.9255 


0.9271 


0.9274 


85 


0.9025 


0.9364 


0.9523 


0.9582 


0.9598 


0.9601 


84 


0.9218 


0.9557 


0.9716 


0.9775 


0.9791 


0.9794 


83 


0.9324 


0.9663 


0.9822 


0.9880 


0.9897 


0.9900 


82 


0.9378 


0.9717 


0.9876 


0.9935 


0.9951 


0.9954 


81 


0.9404 


0.9743 


0.9902 


0.9961 


0.9977 


0.9980 


80 


0.9416 


0.9755 


0.9914 


0.9972 


0.9989 


0.9992 



In Tabic 11 wc see that, for example, the confidence interval [y{85), 1/(97)] 
defined by elements y{S5) and y{97) of the ordered sample of size = 100 
covers the quantile Qo.g of the unknown distribution of the output variable 
y with probabihty (on confidence level) /3 = 0.9523. In other words, having 
N = 100 observations for the output variable y we can state with probability 
f3 = 0.9523 that y[85] < Q0.9 < y[97], i.e. the upper limit of the interval 
(—00, Qo.q] containing 90% of the unknown distribution G{y) is covered by 



9 



[l/(85), 1/(97)] on confidence level p = 0.9523. If [y(85), 7/(97)] C [Lt,Ut], 
then the system is safe, but only on the level (0.9523|0.9). 

When we need stronger criteria of safety, then we have to find confidence 
intervals covering quantiles Q0.95 or Qo.gg with probability near the unity. As 
seen in Tables III and IV the sample size N should be greatly increased. For 
example, if we would like to construct a confidence interval for the quantile 
Q0.99 at the level of f3 = 0.9467 we need sample with ^ 700 elements. The 
production of such a large sample for even one output variable of complex 
systems is very expensive, and at the same time, there is no guarantee that 
the relation [y{r),y{s)] C [Lt,Ut] will be always satisfied, especially when 
the distribution is asymmetric. 

Table III. Confidence levels /3 for confidence intervals covering the Q0.95 unknown 
quantile in the case of sample size N = 150. (The point estimate of Qo.gs is equal to 
Q0.95 = 2/(143)). 



r\s 


144 


1456 


146 


147 


148 


149 


150 


142 


0.2909 


0.4293 


0.5382 


0.6090 


0.6456 


0.6597 


0.6633 


141 


0.4080 


0.5464 


0.6553 


0.7261 


0.7627 


0.7768 


0.7804 


140 


0.4949 


0.6333 


0.7422 


0.8130 


0.8496 


0.8637 


0.8673 


139 


0.5531 


0.6916 


0.8004 


0.8712 


0.9078 


0.9219 


0.9255 


138 


0.5886 


0.7270 


0.8359 


0.9067 


0.9433 


0.9574 


0.9610 


137 


0.6084 


0.7469 


0.8557 


0.9265 


0.9632 


0.9773 


0.9809 


136 


0.6186 


0.7571 


0.8659 


0.9368 


0.9734 


0.9875 


0.9911 



Table IV. Confidence levels jS for confidence intervals covering the Qo.gg unknown 
quantile in the case of sample size N = 700. (The point estimate of Q0.99 is equal to 
Q0.99 = y(694)). 



r\s 


694 


695 


696 


697 


698 


699 


700 


692 


0.2808 


0.4303 


0.5581 


0.6490 


0.7007 


0.7226 


0.7289 


691 


0.3826 


0.5321 


0.6599 


0.7508 


0.8024 


0.8244 


0.8306 


690 


0.4536 


0.6031 


0.7309 


0.8218 


0.8735 


0.8954 


0.9017 


689 


0.4986 


0.6481 


0.7759 


0.8668 


0.9185 


0.9405 


0.9467 


688 


0.5247 


0.6742 


0.8020 


0.8929 


0.9446 


0.9666 


0.9728 


687 


0.5387 


0.6882 


0.8160 


0.9069 


0.9585 


0.9805 


0.9867 


686 


0.5456 


0.6951 


0.8229 


0.9138 


0.9655 


0.9874 


0.9936 
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3.2.2 One-tailed test 



In order to declare that a system is operating safely on a given level, in many 
practical cases it seems to be enough to know that the value of a properly 
selected output variable y with probability near 1 is smaller than the value Ut 
prescribed by technology. In this case we should determine that element y{s) 
of the ordered sample which, with probability j3, is larger than the quantile 
of the unknown distribution G{y) of the output variable y. It means that 
the random interval (— oo,y(s)] covers the proportion larger than 7 of the 
unknown distribution Giy) of output variable y with probability 



In order to determine this probability we should substitute r = into Eq. 
(0), since according to our definition y(0) = —00. We obtain that 



where I{c,j,k) is the regularized incomplete beta function for non-singular 
cases. ^ If y{s) is smaller than Ut, then we can state: the system is safe at 
the level (/3|7). 

If s = A^, i.e. if the largest element of the sample is chosen as upper limit 
of the random interval, then one obtains the well-known formula: 



and since the probability density function of the random variable z{s) = G[y{s)] is nothing 



P = V{y{s) > Q,}. 



P = I^l^^^N-s + l,s) = Y,( .Wa-lf-\ (6) 



j=0 





(7) 



else than 



9s{u) 



u'-^ (1 -m)^-'^ 
B{s,N- s + 1) ' 



so we can write immediately that 



/3 



1 



s-l 



{1-u) 



du^ I{l--f,N - s + l,s) 



B{s,N - s + 1) 



1-1(^,3, N-s+1), 



and this nothing else than 
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Since in the engineering practice one can find misinterpretations it is not 
superfluous to underline the just proven notion of this formula: (3 is the 
probability that the largest value y{N) of a sample consisting of obser- 
vations is greater than the 7 quantile of the unknown distribution of the 
output variable y. This statement can be formulated also as follows: (3 is the 
probability that the interval (—00, y{Ny\ covers the proportion larger than 7 
of the unknown distribution G{y) of the output variable y. 

If s = A^ — 1, i.e. if the (A^ — l)-th element of the ordered sample is chosen 
as upper limit, then we get from ® the following formula: 

/? = l-7^-iV(l-7)7^-\ (8) 

the notion of which is obvious. Clearly, when (3 and 7 are flxed, and the 
second largest element of the sample is chosen for upper limit, then the 
sample size N needed to reach the level (/3|7) is obviously greater than if the 
largest element would have been chosen. For example, let the certainty level 
(0.95|0.95), then if the largest element is chosen, the sample size should be 
Nq = 58, ^ while if the second largest one is applied, the sample size has to 
be A^i = 93. However, it is at all not certain that 7/(^3)(92) < y'^^^\58). (The 
superscript denotes the sample size.) 



1 

>.0.9 

I 0.8 
o 

Q. 

0.7 



0.6 

0.92 0.94 0.96 0.98 
percentile level 

Figure 2: Dependence of the probability /3 on 7 = G (Qp) at three values of s. 

^The root of Eq. 0.95^ - 0.05 = is iV « 58.404, and we are using; the rounded value 
N = 58. In engineering practice the value iV = 59 is accepted. 







N=100 

s=100 

s=99 
-8=98 
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Figure 2 shows the dependence of the probabihty /3 on 7 when = 100 
and s = 100, 99 and 98. One can see the sharp decrease of /? when the 
quantile-level 7 approaches the unity. 

Table V. Sample sizes N^, Ni, . . . , Nq for finding elements y{s), s = Nq.Ni — 
1, . . . , Nq — 6 to be larger than quantiles Qo.gO) Q0.95 and Q0.99 of the unknown distribution 
of the output variable y with prescribed probabilities /3 = 0.90, 0.95 and 0.99, respectively. 



7\/3 


0.90 


0.95 


0.99 


s 




22 


28 


44 


No - 







37 


46 


64 


Ni- 


1 




52 


61 


81 


N2- 


2 


0.90 


65 


75 


97 


m- 


3 




78 


89 


112 


N4 - 


4 




91 


102 


127 


N5- 


5 




103 


115 


141 


Ne- 


6 




45 


58 


90 


No- 







76 


93 


130 


Ni - 


1 




105 


124 


165 


N2 - 


2 


0.95 


132 


153 


197 


N3- 


3 




158 


180 


228 


7V4 - 


4 




183 


207 


258 


Nr. - 


5 




206 


234 


287 


Ne - 


6 




229 


298 


458 


TVo- 







388 


473 


661 


TVi - 


1 




531 


627 


837 


N2 - 


2 


0.99 


666 


773 


1001 


Ns - 


3 




797 


913 


1157 


N4, - 


4 




925 


1049 


1307 




5 




1051 


1181 


1453 


Ne- 


6 



By fixing the values (3 and 7 we may calculate sample sizes Nq, Ni, . . . , 
which are needed for finding elements y{s), s = Nq, Ni — 1 , . . . , ^ k such 
to be larger than the 7-quantile of the unknown distribution of the output 
variable y with prescribed probability /3. We can see in Table V that for 
example the largest element in a sample of size N — 5S with probability 
(3 — 0.95 is greater than the quantile Q0.95 of the unknown distribution. If 
N — 234, then this statement is true for the element y(227). 

3.2.3 Illustrations 

In order to get a deeper insight into the properties of the just outlined 
method, we choose the lognormal distribution with parameters m and d 
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as the "unknown" distribution G{y). We note that this distribution arises 
when many independent random variables are combined in a multiphcative 
fashion. The density function 



0.12 

c 0.1 
o 

I 0.08 

3 

0.06 
■« 0.04 
"° 0.02 


10 20 30 40 50 

y variable 

Figure 3: Lognormal density function with parameter values m = 2.0, 2.5 and d = 0.5. 
The vertical arrows indicate the quantile Qo.95- 

can be seen in Fig. 3 when m — 2.0, 2.5 and d — 0.5. The arrows show the 
quantiles Qo.95 ~ 16.8 (m = 2) and Qo.gs ~ 27.7 (m = 2.5). 

By using Monte Carlo simulation let us generate now four sample of size 
N = 100 corresponding to lognormal distribution with parameters (m = 
2.5, d = 0.5), and denote by A,B,C and D these samples. Calculate the 
point estimates of 0.95-quantiles for each of the samples, and determine the 
shortest two-tailed confidence intervals which cover with probabihty 0.95 the 
"unknown" quantile Qo.95- In the present case we know that Qo.95 ~ 16.8 
(m = 2) and Qo.95 ^ 27.7 (m = 2.5). 

In Fig. 4 the confidence intervals are shown by vertical straight lines. 
Obviously, these intervals are random variables, hence fluctuate from sample 
to sample. In the presented example the sample D is the most unfavorable, 
because in this case we can state only that the "unknown" quantile Qo.95 is 
covered by the interval [23.29, 53.05] with probability larger than P — 0.95. 
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Table VI. Confidence intervals [y{r),y{s)] covering the "unknown" quantile Q0.95 
with probabihty 0.95. 





A 


B 


C 


D 


y{r) 


22.66 


25.21 


22.48 


23.29 


Q0.95 


27.73 


27.73 


27.73 


27.73 


vis) 


33.25 


38.28 


35.88 


53.05 


(r,s) 


(91, 100) 


(91, 100) 


(91, 100) 


(91, 100) 



0.06 
0.04 
0.02 



10 20 30 40 50 60' 



0.06 
0.04 
0.02 



10 20 30 40 50 60' 



0.06 
0.04 
0.02 



B 
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0.06 
0.04 
0.02 



10 20 30 40 50 60' 



Figure 4: Two-sided confidence intervals denoted by vertical straight lines for samples 

A,B,C and D. The intervals are calculated to be covered the true value of the quantile 
Q0.95 with probability larger than /? = 0.95. The density function is lognormal with 
parameters m = 2.5 and d = 0.5. The vertical dashed lines are indicating the true value 
of the quantile Qo.gs- 

If the upper limit Ut determined by technology would be Ut = 40, then only 
three {A, B, C) of four samples could be regarded safe at the level (0.95|0.95), 
however, sample D, which is certainly a "rare event", would decrease the 
weight of our statement. 

As mentioned, in many cases it is enough to know only the element 
y(s), s — N,N — 1,...,N — kof the ordered sample of size N for which the 
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equation 

V{-oo <Q,< y{s)} = V{y{s) > Q,} ^ 

is valid. The test based on the interval (— cx),|/(s)] is called one tailed test. 
First, determine the sample size at which the largest element of the sam- 
ple y{N) with probability {3 is greater than the quantile of the unknown 
distribution G{y) of the output variable y. II (3 — 0.95 and 7 = 0.95, then 
the largest element has to be chosen out of a sample containing N — 
elements. Produce a sample of size iV = 58 simulating the lognormal distri- 
bution with parameters m, = 2.5, d = 0.5, and call it basic sample, denoted 
by Then, repeat randomly the sample production n-times, and denote 
by y''^\y^'^\ ■ ■ ■ ,?/*^"^ the series of samples. We are interested in the largest 
elements y(-')(58), j = 1, . . . , n of samples y^-'^ j — 1, . . . ,n. 

140 [ ; 
120 

1 100 • 



E 

a> 80 • 




200 400 600 800 1000 
number of sample 

Figure 5: Largest elements of 1000 samples of size N = 58. The horizontal line corre- 
sponds to the largest element of the basic sample of size N = 58. This element is equal to 
y(^)(58) w 44.99. 

Fig. 5 shows the largest elements of n = 1000 randomly produced, inde- 
pendent samples of size N — 5S. The minimal value of the largest elements 
is 22.62, while the maximal value is 132.27. One can observe that 224 largest 
elements exceed the value y^^\58) ~ 44.99 which is the largest element of 
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the basic sample. However, this surprisingly great number is in full agree- 
ment with the statement that the interval [0, |/(''^(58)] covers the "unknown" 
0.95-quantile with probability at least 0.95. 

In order to show this, let us introduce the random variable CniQ-y) which 
gives the number of largest elements being greater than the quantile in 
y^''\ 3 — ^1 ■ ■ ■ -iT^ independent samples of size N . Since the probability that 
the largest element in a given sample is greater than is nothing else than 
1 — 7"^, hence, we conclude that 

V{UQ.) = k]=(^^ (l_yv).^.v(.-.). 

Prom this we obtain immediately that 

^{UQ',)}=n{l-l'') and J^{UQ-y)} = ^/ n l"" (1-7^)- 

As known, if n and k are sufficiently large, then the distribution of the random 
variable 

is approximately standard normal, hence we can write that 

- V {E{a(Q,)} - A T>{UQi)} < UQy) < mniQ,)} + A B{UQy)}} , 

where A is the root of Eq. 



2tt J-oO 



2 

It means that the inequality 

mn{Q^)} - A D{Cn(g,)} < UQ,) < mniQy)} + A B{UQ,)} 

is valid with probability w. 

If n = 1000, N = 58,7 = 0-95 and w = 0.95, then we obtain the values 
El^niQ-y)} = 950, 'Dl^niQ'y)} ~ 6.96 and A pa 1.96, hence we can state 
with probability 0.95 that 

936 < 6ooo(<9o.95) < 964. 
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If we count the number of largest elements 7/^^^ (58), j = 1, . . . , 1000 exceed- 
ing (5o.95 that wc know in this example (Qo.95 ~ 27, 728), we obtain the value 
949 that is indeed inside of the interval [936,964]. 

In spite of this "nice" agreement we have to underline that the require- 
ment of safety, for instance, at the level (0.95|0.95) does not exclude the 
appearance of "rare events" such as exceeding the technological limit Ut- 
Therefore, we advice stronger requirements of safety the fulfillment of which, 
of course, is much more expensive. 



3.3 Method based on sign test 

Assume again the cumulative distribution function G{y) of the output vari- 
able y to be continuous but unknown. Let = {?/!, • • • !?/Ar} be a sample 
containing the values of N observations. Define the function 



A{x) 



1, if x > 0, 

0, if x < 0, 



and introduce the statistical function 

N 

zr, = J2A{UT-yj). (9) 

j=i 

which gives the number of sample elements smaller than Ut- Criteria based 
on this statistical function are used to be named sign criteria because zn 
counts only the positive differences Ur — yj, j = 1, . . . ,N . Since we assumed 
that G{y) is continuous, hence the probability of the event {Ut — y = 0} is 
zero. 

Obvious that has binomial distribution since zn is nothing else than 
the sum of N independent random variables with values either or 1. By 
using the notation 

V{A{Ut - y) = 1} = V{y < Ut) = p, (10) 

we can write 

r{z^^j} = {^^iP {i-pf-\ (11) 

i = 0,l,...,7V. 
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The task is very simple. Assume that we have a sample of size and 
for this sample = k < N. We should determine a confidence interval 
[7i(fc), 7c/(/c)] which covers the value p with a prescribed probability p. The 
unknown p defined by (fTUI) is nothing else than the probability that the 
output variable y is not larger than the technological limit Ut- When the 
lower confidence limit 'jiik) is near the unity, then, since 7L(fc) < p, we 
can state at least with probability (3 that the chance of finding the output 
variable y smaller than Ut is also near the unity, and so the system operation 
can be regarded safe at the level [/3|7l(A;)]. 

3.3.1 Approximate calculation 

If the sample size A^ > 50, then the random variable 

k- Np 



^Np (1 - p) 



has approximately standard normal distribution, where k is the number of 
sample elements not larger than Ut- Let (3 be the confidence level, then we 
can write that 



r{\Ck\<up} = v 



\k - Np\ 
^Np (1 - p) 



< 



Uf3 



where ^{x) is the standard normal distribution function. This equation can 
be rewritten ^ in the following form: 

v{\c,\<up}=v{a<ui} = 



V{{N + ul){p - 7l)(p - lu) < 0} = /3, 



where 

IL = liik^up) 

and 

7r/ = lu{k,Uf3) 



N + u} 



N + u} 



k + \ul- up^k{l-k/N)+uyA 
k + \ul + up^k{l-k/N)+uyA 



(12) 



(13) 



(14) 



^Thc following elementary considerations can be found in any textbook for statistics, 

e.g. m- 
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It is obvious that [p — 7l(A;, li^)]];) — ^u{k, ti/j)] < is fulfilled only, if 
and therefore 

r{\Ck\ < M = 'PbLik,up) <P< 7uik,up)} = 13 (15) 
where up is the root of Eq. 

This equation shows clearly that the interval [7l(A;,m^), 7c/(fc, M/j)] covers the 
unknown p with probability (5. 

In many cases wc do not need the restriction due to the upper confidence 
limit. We want to know only the probabihty of the event {'-)L{k^vp) < p}. 
Since Cfc at fixed is a decreasing function of p, the events {Qk < Vjj] and 
{7l(A;, Vf)) < p} are equivalent, and so we can write 

V{Ck < vp} = vp) < p} = ^{vp) = 13. (16) 

Consequently, the operation of a system can be regarded safe if the param- 
eter p for all output variables is covered by YfLik^vp), 1] with a prescribed 
probability /9, provided that ^L{k,vp) is near the unity. ^ For the sake of 
simpler notation in the sequel jL{k,vp) and ju{k,up) will be denoted by 7^ 
and 7c/, respectively. 

The event {y < Ut} belonging to the acceptance region of the sample 
space will be called success. Now, let us calculate the number of successes k 
needed in a sample of size N to ensure a fixed confidence level (3 and a given 
lower confidence limit jl- 

Table VII. Numbers of sample elements k in samples of size N = 100(10)200 
needed for the acceptance on level /3 = = 0.95. (The approximate formula (13) has 
been used for calculations.) 



k 


99 


108 


118 


128 


137 


147 


157 


166 


176 


185 


195 


N 


100 


110 


120 


130 


140 


150 


160 


170 


180 


190 


200 



'It is obvious that 7L(fc,U/3) > 7l(A;,U/3). 
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In Table VII we see the numbers of successes needed in samples of size 
= 100(10)200 in order to reach the level /5 = 7l = 0.95. The requirement 
is quite sever: if the sample size N = 100 one should have A; = 99 successes! 

For illustration of the method the approximate '^l < P values have been 
calculated at confidence levels /3 = 0.90(0.01)0.99 when the sample size N — 
100 and the number of successes k = 90(1)100. The results are shown in Table 
VIII. It can be seen, for example, that if the event {y > Ut} occurs only once, 
then it can be stated with probability f3 = 0.95 that 71, = 0.9564 < p. It 
means that the appearance of "dangerous" events {y > Ut} is not excluded 
even if the level of acceptance is better than (0.95 1 0.9564). 

Table VIII. Approximate Jl < P values calculated at confidence levels (3 = 
0.90(0.01)0.99 for numbers of success k = 90(1)100. Sample size N = 100. 



k\!3 


0.90 


0.91 


0.92 


0.93 


0.94 


0.95 


90 


0.8549 


0.85245 


0.8498 


0.8469 


0.8435 


0.8396 


91 


0.8664 


0.8640 


0.8615 


0.8586 


0.8553 


0.8515 


92 


0.8781 


0.8758 


0.8733 


0.8704 


0.8672 


0.8635 


93 


0.8899 


0.8877 


0.8852 


0.8825 


0.8794 


0.8757 


94 


O.OOl!) 


0.8997 


0.8974 


0.8947 


0.8917 


0.8882 


95 


0.9141 


0.9120 


0.9097 


0.9072 


0.9043 


0.9008 


96 


0.9266 


0.9246 


0.9224 


0.9200 


0.9171 


0.9138 


97 


0.9394 


0.9376 


0.9355 


0.9331 


0.9304 


0.9273 


98 


0.9528 


0.9511 


0.9491 


0.9469 


0.9444 


0.9414 


99 


0.9672 


0.9655 


0.9637 


0.9617 


0.9593 


0.9564 


100 


0.9838 


0.9823 


0.9806 


0.9787 


0.9764 


0.9737 



k\f3 


0.96 


0.97 


0.982 


0.99 


90 


0.8350 


0.8292 


0.8213 


0.8085 


91 


0.8470 


0.8413 


0.8335 


0.8208 


92 


0.8591 


0.8535 


0.8458 


0.8333 


93 


0.8714 


0.8659 


0.8584 


0.8460 


94 


0.8839 


0.8786 


0.8712 


0.8591 


95 


0.8967 


0.8915 


0.8843 


0.8724 


96 


0.9099 


0.9048 


0.8978 


0.8861 


97 


0.9235 


0.9186 


0.9117 


0.9003 


98 


0.9377 


0.9330 


0.9264 


0.9152 


99 


0.9529 


0.9484 


0.9420 


0.9311 


100 


0.9703 


0.9658 


0.9505 


0.9487 
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3.3.2 Exact calculation 

When the sample size N is smaller than 50 we cannot apply the asymp- 
totically valid normal distribution. For the exact calculation of confidence 
limits we used a slightly new version of the method proposed by Clopper and 
Pearson 

The probability of finding at least k successes from observations is 
nothing else than 

^r(P) = EHp^(i-pr^ (17) 

j=o ^ 

where 

P = V{y<UT}. 
As known, this formula can be written in the form: 



[1-v)'' v^~'''Uv, (18) 



k\ {N-k-l)\ Jq 



and it is obvious, that S^^^ (p) is a continuous monotone decreasing function 
of p, since 

= ^ /(1-p)— <0. 

dp k\ {N-k-l)\^ ^ ^' 

Taking into account that 

' 1, ifp = 0, 



0, ifp=l, 



it is evident that S\^\p) assumes any values in the interval [0, 1] only once. 
Consequently, a p = ps value can be determined so that 



Si''\ps)=5, V0<5<1. 
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Since Sl!^\p) is a monotone decreasing function, if p > ps, then 



sf\p)<Sr>{ps)=S. 



1 

(A 0.95 
I 0.9 

0.85 

1 0.8 

I 0.75 
u 
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Figure 6: Dependence of the upper and the lower confidence limits on the number of 
successes k at confidence level f3 = cl = 0.95 in cases of sample size TV = 50 and 100, 
respectively. 
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Clearly, the function 



Rf\p)=i-sn{p) 

will satisfy the inequality 



N 

E 

j=k 



6, 



p> (1 -p) 



N-j 



(19) 



if p < ps- 



Fixing the confidence level (3 one can obtain the upper confidence limit 7^7 



for the unknown parameter p from S"^ 



(TV) 



confidence limit 7l is determined by R^^\'-)i 



< 



2(1 — while the lower 

L) < ^(1 — P)- Now one can 
formulate the statement that the random interval [jl, "Ju] covers the un- 
known parameter p with probability j3. 
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Figure 7: Dependence of the the lower confidence hmit on the number of successes k on 
three confidence levels f3 — cl ^ 0.90, 0.95, 0.99 when the sample size N = 100. 

For the sake of illustration Fig. IHl shows the dependence of the upper 
and the lower confidence limits on the number of successes k on confidence 
level j3 = 0.95 in cases of sample size = 50 and 100, respectively. For 
example, if k = 98, i.e. two observations out of = 100 are failed, then we 
can state with probability 0.95 that the unknown p is covered by the interval 
[0.9296, 0.9975]. 

As mentioned already in many practical situations it suffices to know that 
the interval [7^, 1] calculated from the sample of A^ observations covers the 
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chance of success p = V{y < Ut} with prescribed probabihty /3. Fig. 7 
shows the dependence of the the lower confidence hmit 7/, on the number 
of successes k at three confidence levels (3 = d = 0.90, 0.95, 0.99 when the 
sample size = 100. 

Table IX. Lower confidence limits at three levels when the number of successes 
k = 90(1)100. Sample size N = 100. 



I3\k 


90 


91 


92 


93 


94 


95 


0.90 


0.8501 


0.8616 


0.8733 


0.8850 


0.8970 


0.9092 


0.95 


0.8362 


0.8482 


0.9602 


0.9725 


0.8850 


0.8977 


0.99 


0.8086 


0.8212 


0.8340 


0.8471 


0.8604 


0.8741 



13 \ k 


96 


97 


98 


99 


100 


0.90 


0.9216 


0.9344 


0.9476 


0.9616 


0.9772 


0.95 


0.9108 


0.9242 


0.9383 


0.9534 


0.9704 


0.99 


0.8882 


0.9030 


0.9185 


0.9354 


0.9549 



Table IX contains the 7^ values plotted in Fig. 7 for the mostly used 
confidence levels provided that the sample size N — 100. It is remarkable 
that even in that case when k — 100, i.e. when all elements of a sample can be 
found in the acceptance interval we can state with probability f3 = 0.95 only 
that the unknown p value is covered by the interval [0.9704, 1], or simply, 
but not precisely: the p is larger than 0.97 with probability 0.95 One can 
imagine a number of cases where this statement is not enough to declare: the 
operation of the analyzed system can be regarded safe. 

3.4 Tolerance interval method 

Assume again that we have N independent values yi, . . . ,yisr of the output 
variable y. Let 7 and f3 be positive numbers not larger than 1. Now, we wish 
to answer the following question: On the basis of a sample Sn = {yi, ■ ■ ■ ,yN} 
can we state that a fraction larger than 7 of the distribution G{y) lays with 
probability (3 in an interval [L,U] C [L^, Ut] ? 

In order to answer this question, let us construct from the sample Sj^ two 
random functions L — L{yi, . . . , yjv) and U — U{yi, . . . , yjv), called tolerance 
limits, such that 

r{J^ dG(y)>^}^(3. (20) 
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We remark that 

u 

dG{y)=A{y^,...,yN) (21) 

is a random variable, sometimes called probability content, which measures 
the proportion of the distribution included in the random interval [L, [/]. 
Probability (3 bears the name confidence level. For safe operation it is advis- 
able to specify the probability content 7 and the confidence level f3 as large 
as possible in the interval (0, 1). 

Having fixed /? and 7, from definitions of L{yi, . . . , y^) and U {yi, . . . , t/at) 
it becomes possible to determine the number of runs N. Carrying out N 
runs, we get a sample {yi, . . . ,yN}, from which we can calculate an appro- 
priate tolerance interval If that interval lies in [Lx,Ut] we declare 
the operation safe. ^ This program can be easily realized when the distribu- 
tion G{y) is known and normal, however, in subsection 3.4.1 the problem of 
distribution free tolerance interval will be discussed. 



3.4.1 Distribution free tolerance limits 

To solve the problem of setting tolerance limits when nothing is known about 
the cumulative distribution function G{y) except that it is continuous, seems 
to be not an easy task. Exploiting advantages of the order statistics, Wilks jS] 
was the first who found a satisfactory solution to the problem and somewhat 
later Robbins ^U] published a nice proof that distribution free tolerance limits 
can be given only by means of order statistics. 

It is evident that in the order statistics we are unable to exploit the total 
amount of information which is present in the sample when the distribution 
function G{y) is unknown. Consequently, with 7 and P given, we anticipate 
either a wider tolerance interval around the sample mean or a larger sample 
size to achieve the same tolerance interval as in the case of known G{y). 
Not going into details, we give here a well-known theorem, which is useful in 
uncertainty and sensitivity analysis of codes. 

Theorem 2 Let yi,...,yN be N independent observations of the random 
output y. Suppose that nothing is known about the distribution function G{y) 
except that it is continuous. ^'^ Arrange the values of yi, . . . ,yN in increas- 

^Many authors have discussed the problem of settmg tolerance limits for a distribution 
on the basis of an observed sample. The pioneering work was done by S. S. Wilks j8| and 
by A. Wald 0. 

^"^It can be shown that the one-sided continuity only is needed. 
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ing order, and denote by y{k) the k-th of these ordered values; hence in 
particular 

y{l) = min yk, y{N) = max yk, 

l<k<N l<k<N 

and by definition y{0) = —oo, while y{N + 1) = +00. In this case for some 
positive 7 < 1 and (3 < 1 there can be constructed two random function 
L{yi, . . . , ?/7v) O'lT'd U{yi, . . . , Pn), called tolerance limit, such that the proba- 
bility that 

^ dG{y) > 7 

holds is equal to 

(3=l-I{^,s-r,N -s + r + l) = ^ ( . j 7^' (1 - 7)^"^ (22) 
where 

,M,>^>^r ^-'!^-f-' ^u, By..)^(i^iii(^. (23) 



B(j,k) • ' U + k-l)\ 

< r < s < N, and L = y(r), U = y{s). 

The proof of Theorem 2, which is a simphfied version of Wald's proof, is 
given in Appendix II. 

The selection of tolerance limits L = y{l) and U = y{N) appears to be 
expedient in many cases. Substituting r = 1 and s = in Eq. ()22|) . we get 
for the two-sided tolerance interval the expression 

/3 = 1 - 7^ - Ar(l - 7) 7^-1. (24) 

Often we are interested solely in the upper tolerance limit U = y{N) and 
we call the interval [y{0),y{N)] one-sided tolerance interval. Now r = and 
s = N, therefore 

/3 = 1 - 7^. (25) 

When the lower limit is of interest, we select [y{l), y{N -\- 1)] and this is also 
a one-sided tolerance interval. Substituting r=l and s=N+l into expression 
we obtain again. 

'^The probability that equal values occur is zero. 
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Finally, we make two remarks. Two outputs are considered the same if 
their difference is smaller than the round-off error. Therefore the probability 
that two runs yield the same output is very small but not zero. The second 
remark is that expressions (j23)-(I23) inay appear as a relationship between 
two probabilities /3 and 7. However, 7 is not a probability, which can be seen 
from the nonsensical interpretation for 7 from any of the mentioned expres- 
sions. In Table X. we compiled the probability content 7 of the tolerance 
interval [y{l),y{N)] for (3 = 0.9,0.95,0.99 and = 10(10)100(25)300. 

If we are interested in a tolerance interval [L, U] which includes larger 
than 7 = 0.953 proportion of the distribution of the output with probability 
j3 = 0.95, then we should make 100 runs, see Table X. and select the lowest 
output as L and the largest as U. If U is smaller than the technological limit 
Ut, then the system is safe at the level 7 = 0.953, (3 = 0.95. This means 
that additional runs may produce an output exceeding U but this portion of 
runs is not larger than 4.7% of the total number of runs. However, these rare 
output values may be greater than the technological limit Ut- Evidently, if 
U is larger than Ut, the system must be declared unsafe. 

Table X. 7 values of tolerance interval [y{l), y{N)] for /3 = 0.9,0.95,0.99 and 
N = 10(10)100(25)300. 



N 


7 values 




f3 = 0.90 


/3 = 0.95 


P = 0.99 


10 


0.66315 


0.60584 


0.49565 


20 


0.81904 


0.78389 


0.71127 


30 


0.87643 


0.85141 


0.79845 


40 


0.90620 


0.88682 


0.84528 


50 


0.92443 


0.90860 


0.87448 


60 


0.93671 


0.92336 


0.89442 


70 


0.94557 


0.93402 


0.90890 


80 


0.95225 


0.94207 


0.91989 


90 


0.95747 


0.94837 


0.92851 


100 


0.96166 


0.95344 


0.93554 


125 


0.96924 


0.96262 


0.94813 


150 


0.97432 


0.96877 


0.95658 


175 


0.97796 


0.97318 


0.96268 


200 


0.98069 


0.97650 


0.96736 


225 


0.98282 


0.97909 


0.97087 


250 


0.98453 


0.98118 


0.97375 


275 


0.98593 


0.98287 


0.97618 


300 


0.98710 


0.98429 


0.97809 
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Fi gure 8: The dependence of the probabihty (3 = pr on the the number of runs N at 
probabihty contents 7 = pc 0.8, 0.9, 0.95, 0.98 0.99. 

To get some insight into relation ()24j) we present the probabihties (3 ver- 
sus N for six 7 values, see Fig. 8. With increasing number of runs, each 
interpolated curve reaches saturation, and (3 tends to unity as tends to 
infinity. The smaller is the 7 value the sooner comes the saturation, because 
small 7 means that only a small fraction of the calculated output is required 
to fall into the given interval. 

Small 7 value is not acceptable in safety analysis for small 7 means that a 
large portion of output values may fall outside the tolerance interval. Practi- 
cally we need 7 > 0.95. For example, if we wish the tolerance interval [L, U] 
to include larger than 7 = 0.98 proportion of the output values with proba- 
bility (3 = 0.95, we need approximately 235 runs in order to get the proper 
L and U. In spite of the large number runs the probability content 7 = 0.98 
is far from being completely satisfactory. To achieve a better probability 
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content, say 7 = 0.99 with probability (3 = 0.95, we need 473 runs, which is 
practically hard to realize. 



3.4.2 Known cumulative distribution function 

Let us assume the cumulative distribution function G{y) to be known. How- 
ever, one should emphasize that there are situations where it would be par- 
ticulary dangerous to make unwarranted assumptions about the exact shape 
of distribution G{y). In general, the attempt to get an explicit expression 
for jS by means of expression (j20|) would fail. There is however one excep- 
tion, when G{y) is of normal distribution N{m, a) then exact formula can be 
obtained for p. 

We shall denote by i/N the sample estimate of the expectation value m 
and by ajf that of the variance cr^, i.e. 



1 ^ 1 ^ 

k=l k=l 

Let us construct two random variables, viz. 

L = . . . ,?/Ar; A) = ^Af-A o-Tv and U = U{yi, . . . ,yN; X) = yN + >^ ^n, 

where the parameter A scales the length of the interval [i^, f/]. Denote by 
A{yN, A5"7v) the proportion of the output distribution included between the 
limits L{yi, . . . ,yN] \) = yw - Aa^v and f/(?/i, . . . , y^; \) = y^ + Xa^, i.e. 

A{yN, Acttv) = j g{y) dy = — =- J exp[ — — ] dy. (27) 



Introducing new variable z = {y ~ m)/a we obtain 

A{m + azj^, X(7]\r) = p{z]\r, Sn) = —= / ''^ dz, (28) 
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where 



In 



Vn -rn ^ aN 

zn = and sn = — . 

a a 



^^It is worth mentioning that if output variable ?/ is a sum of a large number of small, 
statistically independent random variable, then its distribution is almost normal. Now we 
discuss the case when the output variable y is of normal distribution. 
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while 

^n = zn — ^sn and un = zj^ + Xsn. 

We stress again that p{zN,S]\f) is a random variable because in expression 
fl28|) the limits of the integral are random variables. 

Theorem 3 For any given positive value of A the probability that p > j, 
where << 7 < 1 is expressed by 



W{X,j,N) = l-^— I Kr^-i 



00 



(iV-1) 



A 



e-^'^'/^ dp, 



(29) 

where Kn_i[- ■ ■] is the distribution with {N — l)-degrees of freedom and 
q{p, 7) is the solution of the equation 



e-y^dx = ^. (30) 

fi-q 
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The value A determining the tolerance interval at a preassigned probability 
content 7 and a preassigned significance level j3 in the case of N runs can be 
calculated from the equation 

W^(A,7,iV)=/5, (31) 

and it is independent of unknown parameters m and a of the distribution 
function G{y). The equation / f37]) has exactly one root in X, since W{X, 7, A^) 
is a strictly increasing function of X. 

Proof of Theorem 3 is given in Appendix III, since the mathematical 
details are not relevant to the aim of the present work. However, it is worth 
mentioning that an approximate tolerance interval can be derived when N 
is large (e.g. > 50). 

^^If one-sided tolerance interval with upper limit is needed, then Eq. should be 

replaced by 



1 



/27r 



e"^ dx = 7. 
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Theorem 4 The approximate two-sided tolerance interval is given by 
where 

Here Qn-i^^ — P) is (1 — /5) -percentile of the distribution with [N — 1) 
degree of freedom and g(l/\/iV, 7) is the root of the equation 



271 J^-q 



e 



^ /2 dz = 7. (33) 



The Xa for the approximate one-sided tolerance interval with upper limit can be calculated 
in the same way, but Eq. has to be replaced by 



V2n 

Proof of Theorem 4 is given in Appendix IV. 

Table XI. A values of two-sided tolerance intervals for the number of runs Af=50(5)100 





f3 = 0.90 


13 = 0.95 


13 = 0.99 


iV\7 


0.90 


0.95 


0.99 


0.90 


0.95 


0.99 


0.90 


0.95 


0.99 


50 


1.916 


2.284 


3.001 


1.996 


2.379 


3.126 


2.162 


2.576 


3.385 


55 


1.901 


2.265 


2.976 


1.976 


2.354 


3.093 


2.130 


2.538 


3.335 


60 


1.887 


2.248 


2.956 


1.958 


2.333 


3.066 


2.103 


2.506 


3.293 


65 


1.875 


2.234 


2.936 


1.943 


2.315 


3.042 


2.080 


2.478 


3.257 


70 


1.865 


2.222 


2.920 


1.929 


2.299 


3.021 


2.060 


2.454 


3.225 


75 


1.856 


2.211 


2.906 


1.917 


2.285 


3.002 


2.042 


2.433 


3.197 


80 


1.848 


2.202 


2.894 


1.907 


2.272 


2.986 


2.026 


2.414 


3.173 


85 


1.841 


2.193 


2.882 


1.897 


2.261 


2.971 


2.012 


2.397 


3.150 


90 


1.834 


2.185 


2.872 


1.889 


2.251 


2.958 


1.999 


2.382 


3.130 


95 


1.828 


2.178 


2.862 


1.881 


2.241 


2.945 


1.987 


2.368 


3.112 


100 


1.822 


2.172 


2.854 


1.874 


2.233 


2.934 


1.977 


2.355 


3.096 



In order to give an impression of A values (i.e. of the tolerance interval 
around the sample mean of the output variable), Table XI. contains the A 



32 



0.5 [, 

20 40 60 80 100 

sample size 

Figure 9: Dependence of the confidence level /? on the sample size N at probability 
contents ^ — pc = 0.95, 0.97, 0.98 when the interval parameter X = ip = 2.5. 

values associated with often used 7 and P for the sample sizes iV=50(5)100. 
One can see that at A^=100 the tolerance interval which includes 95% of the 
distribution with 95% probability is given by 

[Vioo - 2.235-100, Vioo + 2.23crioo] • 

If that interval lies within [Lt, Ut] then the system is safe on level 7 = 0.95 
and /3 = 0.95. 

Fig. 9 shows convincingly the interrelations between the basic character- 
istics of the tolerance intervals for a normal distribution. As expected the 
confidence level jS increases with increasing sample size provided that the 
coverage pc = j and the interval parameter ip = X are fixed. 

However, if the fixed coverage 7 exceeds a critical value •jcrt ~ 0.98758 
when X = ip = 2.5, then one can observe an "anomalous" behavior of the 
dependence (3 on N, as shown in Fig. 10. It is seen that the probability 
P of finding the proportion 7 > •jcrt of the distribution G{y) in the interval 
{zn — X sn, zn + X sn) decreases with increasing sample size > Ncrt, 
where Ncrt depends on both A and 7. The explanation is straightforward: 

^''More detailed tables can be found in 11 . 

^^If one-sided tolerance interval with upper limit is needed, then A — 2.23 has to be 
replaced by A = 1.75! 
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ip=2.5 
pc= 0.985 
pc= 0.990 
■ pc= 0.995 
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Figure 10: Dependence of the confidence level /3 on the sample size N at probability 
contents larger than the critical value ^crt ~ 0.98758 and provided the interval parameter 
X = ip = 2.5 is fixed. 
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Figure 11: Dependence of the confidence level /3 on the interval parameter A = at 
probability content — pc — 0.95 for three sample sizes N = 40, 50, 60. 



Since 

Af^oo N 



lim = and lim sn — 1, 
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2 2.2 2.4 2.6 2.8 3 
interval parameter 



Figure 12: Dependence of the confidence level (3 on the interval parameter X = ip at 
probability content 7 = pc = 0.99 higher than the critical value for three sample sizes 
N = 40, 50, 60. 



it is evident that 

Mm p{zn,sn) = 'Jcrt, where 7^^ = ^= / e"^'/^ c?x, 
^-»oo V27r J-x 



consequently, if 7 = •Jch + where < 6 < 1 — jcrt, then 

hm V{\p{zn, sn) - 7crt\ > 5} =0, 

i.e. VIIp^zn, sn) — 7crt| > ^} is a monotonously decreasing function of 
> Ncrt- It easy to show that 

V{\p{Zn, Sn) - lcrt\ > 5}> V{p{zN, Sn) > Jcrt + 5} = P, 



^^Introducing the notations: 

{p{zn,Sn) < Jcrt - (5} = -4^^'' 



and 

{pIzn^Sn) > Jcrt + S} ^ 

and taking into account that -A^'' n A'}^ ' = 0, we can write that 

V{\p{Sn, Sn) - 7crt| >S}^ V{A''j^^ U ^^^} > V{piSN, Sn) > Icrt + 5}. 
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and so one can state that (3 decreases with increasing N > N^rt if 7 > 7crt 
provided A is fixed. 

It is not superfluous to know how docs the confldence level j3 depend on 
the interval parameter A at a fixed probability content (coverage) 7 and at 
a given sample size N. Fig. 11 shows this dependence at 7 = pc = 0.95 for 
three sample sizes N — 40, 50, 60. What we see completely corresponds to 
our expectations, however, as seen in Fig. 12, the character of (3 vs. A curves 
is radically changing. The explanation is the same as in the case of Fig. 10. 



4 Several output variables 



Now we assume the output to comprise n variables. Let these variables be 
Hi, . . . ,yn- If they are statistically completely independent we can apply 
the results of previous Sections, otherwise we need new considerations. Let 
G{yi, . . . , yn) be the unknown joint cumulative distribution function of the 
output variables, furthermore, let 



( yu yi2 

y2i y22 

V ym yn2 



yiN \ 

y2N 

ynN J 



(34) 



be the sample matrix obtained in N » 2n independent observations (runs). 
Introducing the n-components vector 



yk 



( yik \ 
y2k 



\ ynk / 

the sample matrix can be written in the form: 

Sn = {yi,---,yiv}. 

By using proper statistical methods for testing the sample matrix we can 
make useful probabilistic statement about the safety of system operation. 



^^There are many fairly good statistical tests to prove the independence of random 
variables. 
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First, we will show how to generalize the method of sign test for several 
output variables, and then we will deal with the problem of setting tolerance 
limits for more than one random variable. 



4.1 Sign test 

For the sake of simplicity we are going to deal with two output variables 
yi and y2 provided their joint distribution function G{yi,y2) is unknown, 
but continuous at least from right (or from left) in both variables. Let us 
accept that the system operation can be declared safe if the requirement 
{yi < u!p , y2 < U^^ } is reahzed with probability 

Pi2 = n?/i<4'\ z/2<4'^} (35) 

near the unity. Here U^^ and U^^ are the limit values defined by technology, 
and they define the acceptance region of the (1/1,1/2) plane. Since the pi2 
is unknown, the task is to construct from the sample a confidence interval 
[il' ' 7c/' ] which covers the pi2 with probability ^12- In most of the cases 
it is sufficient to calculate the 7})'^'' only and to use the interval [7},^'^'', 1] as 
confidence interval. Let the 2-components vectors 

\y2kj 

be elements of a sample 5^ obtained by independent observations. One 
should emphasize that yj and yk are independent if j 7^ k, but the components 
of a given sample vector are not. 

In order to use a terminology as simple as possible, the event {yi < 
y2 < U^^} will be called success. Define now the function 



A a 



and introduce the statistical function 

N 



1, a yik < U^' emd y2k < , 
otherweise. 



(1,2) _ 

k=l 



J2 A (4'^ - yik) A (4^ - y2k) , (36) 
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(1 2) 

which gives the number of successes in a sample of size N. Since Zj^f' ' is 
the sum of independent random variables with values either 1 or 0, it is 

(1 2.) 

obvious that ^j^' is of binomial distribution. By using the notation 



V{A (4^) - y,) A (4^) - 1/2) = 1} 



we can write 



7^{4'^) ^k}^ (^^^pt^ (1 - Pi2)^-', V ^ = 0, 1, . . . , TV, 

and this is the point where we can use from the results of Subsection 3.3. 

Now, we would like to make a trivial but important amendment. Define 
two statistical functions: 



N N 

(1) 



J2^{uf^-yu) and 4^ = J2^{u¥'-y2j) 

i=l j=l 



Clearly, z^j^^ and z^^^ are not independent, but both of them are sum of 
independent random variables with values either 1 or 0, consequently one 
can write 

.(1) _ .-1 _ /^M^Ul ^ \N-i 



and 



n4^ = ^} = (^)pi(i-pi) 

i,j = l,...,iV, 



where 



Pe 



= V{yi < U?} = V{A {U? - = 1}, 
£ = 1,2, 

are unknown probabilities. By using the samples iS^"* = {yu, i = 1, . . . , N} 
and iS^"* = {y2j, j = l,---,^} separately with help of the method de- 
scribed in Subsection 3.3 we can construct two random intervals [7^^^'*, 1] and 
[y^^\ 1] covering pi as well as p2 with probabilities /3i and l32, respectively. 
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Obviously it could be occurred that the levels (AITl ) (P^Itl ) support 
the statement that the samples and S^^ separately do not contradict to 
the requirement of safe operation, however, from this one cannot conclude 
that the operation of the system is safe on a preassigned level for variables 
yi and 1/2 tested jointly. The reason is clear: the output variables yi and 
y2 are not independent, and in this case we have to know weather the value 
P12 = < U^\y2 < U^^} is covered by the interval [7^"^'^^ 1] with a 

preassigned probability f3i2. Clearly, 7^^'^'* < min{'-f^j^\'~f^j^^}, therefore 7^"^^ 
and 7}^ do not contain sufficient information to declare that the operation 
of the system is safe. The procedure should be as follows: firstly test the 
hypothesis that the output variables yi and y2 are dependent, and if this is 
the case, estimate the probabihty of the event {yi < U^\y2 < U^^}, and 
not the events {yi < U^^} and {^2 < U^^} separately. 

Finally, we would like to note that the generalization of the sign test for 
n > 2 output variables is straightforward: we have to use the statistical 



k=i j=i 

in order to obtain the sum of N independent random variables, and then the 
further steps will be the same as they were in Subsection 3.3. 

4.1.1 Illustration 

Now we want to present an example to show how the sign test method is 
working. By using Monte Carlo simulation we have generated two samples a 
and b. Both are consisting of = 100 value pairs due to the population of a 
bivariate normal distribution with parameters mi = m2 = and Ui = (J2 = 1, 
but the correlation coefficient is C = 0.1 in a, while C = 0.7 in b. 

One can see in Fig. 13 that in the sample a four, while in the b two 
observations out of A?" = 100 can be found in rejection region. 

From Table IX. one can read that in the case of sample a the inter- 
val [0.9108, 1] covers the parameter pi2 with probability (3i2 = 0.95, at 
the same time both pi and p2 are covered by the interval [0.9383, 1] with 

= (32^ 0.95. The level (0.9383|0.95) is not "very good", but better than 
(0.9108|0.95), however, in the decision about the safety one should take evi- 
dently into account the level calculated for the parameter pi2, and not those 
calculated separately for pi and p2- 



function 



N 



n 
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Figure 13: Sample vectors denoted by points in the sample plane (2/1,2/2) and the ac- 
ceptance region defined by technological requirements. Upper figure (sample a) and lower 
figure (sample b) refer to correlation coefficients C = 0.1 and C = 0.7, respectively. 



Testing the sample b which shows a strong correlation between the vari- 
ables yi and 1/2, we find that the confidence interval [0.9383, 1] covers the 
parameter pi2 with probability (3i2 — 0.95. Consequently, we can state with 
probability 0.95 that the chance of the event {yi < Uj^\y2 < U^^} is higher 
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than the value 0.9383, i.e. we are able to declare that the operation of the 
system is safe on the level (0.95|0.9383) only. The parameters pi and p2 
are covered by intervals [0.9534, 1] and [0.9383, 1], respectively, with the pre- 
scribed probability (3i = (32 = 0.95, however, these values are not informative 
for the safety of the system. 

This simple example shows convincingly that the tests performed sepa- 
rately on output variables which are depending on one another could bring 
about false decision concerning the safety of the system operation. 

4.2 Tolerance region 

The problem of setting tolerance limits for output variables . . . ,?/„ can 
be formulated as follows. Assume that the unknown joint distribution func- 
tion G{yi, . . . ,yn) is absolute continuous, i.e. it has a joint density function 
g{yi, . . . ,yn)- For some given positive values 7 < 1 and (3 < 1 we have to 
construct n pairs of random variables Lj{yi, . . . , y„) and Uj{yi, . . . , y„) j = 
1, . . . , n such that the probability that 

rUi rU„ 

1 ■■■ giVu ■ ■ ■ ,yn) dyi- ■ ■ dyn > 7> (37) 

holds is equal to p. A natural extension of the procedure applied previously 
to the one variable case would seem the right selection. Unfortunately that 
choice does not provide the required solution since the probability of the 
inequality (jHTj) depends on the unknown joint density function g{yi, . . . , ?/„). 
Our task is to find a reasonable procedure such that the probability f3 is 
independent of g{yi, . . . ,yn)- It can be shown that such a procedure exists 
but its uniqueness has not been proven yet. 

Since the distribution function G{yi, . . . ,yn) is continuous, we can state 
that no two elements of the sample matrix are equal. The sequence of 
rows in the sample matrix 5^ can be arbitrary, reflecting the fact that we 
number the output variables arbitrarily. 

Let us choose the first row of the sample matrix, and arrange its elements 
in order of increasing magnitude ?/i(2), . . . ,yi{N). Select from these 

?/i(ri) as Ll and yi(si) > yi(ri) as Ui. Let ii,i2, ■ ■ ■ ,isi-ri~i stand for the 
original column indices of elements ?/i(ri + l),?/i(ri + 2), . . . ,yi{si — 1). In 
the next step, choose the second row, the N observed values of the output 
variable 7/2 and arrange the part 2/2ii, 2/2*25 ••• 5 l/2jsi-ri-i of its elements in 
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increasing order to obtain ^2(1) < ?/2(2) < ■ ■ ■ < y2(si — ti — 1). From among 
these, 1/2(^2) and 1/2(52) > ?/2(^2) arc selected for L2 and [/2 and evidently 
r2 > ri, S2 < Si — ri — 1. We continue this imbedding procedure to the last 
row of the sample matrix and define a n-dimensional volume 

Vn = {[Li, C/i] X [L2, C/2] X • • • X [Ln, Un]}, 

where 
and 

> Tj-i > • • • > ri, 

while 

rj < Sj < - rj_i - 1, V j = 2, . . . , n. 

Theorem 5 /n i/ie case ofn>2 dependent output variables with continuous 
joint distribution function G{yi, . . . ,yn) it is possible to construct n-pairs of 
random intervals [Lj, Uj], j — l,...,n such that the probability of the 
inequality 

I ■■■ 9{yi,---,yn) dyi---dyn > 7, 
is free of g{yi, . . . , yn) and is given by 

'P \ I ■ I 9{yi: ■■■^Vn) dyi---dyn> ^ 

= 1-7(7, s„-r„,A^-s„ + r„ + l) (38) 
Here function /(•••) is the regularized incomplete beta-function and 

n—l 

Sn < Sn-1 - Tn-i - 1 < Si - ^(r^ + 1) and rn > Tn-i > ■ • • > n. (39) 

Proof of Theorem 5 is given in Appendix V. 

^^This n-dimcnsional volume is the tolerance region which is nothing else than a subspace 
of an n-dimensional Euclidian space. 
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4.2.1 Illustrations 



In several practical applications the choice ri = r2 = ■ ■ ■ = = 1 and 
Sn = N — 2{n — 1) can be advised, hence the confidence level (3 for a two- 
sided tolerance region is given by 



p = 1- I {-f,N -2n + l,2n) = ^ ( . J 7^' (1 - 

i=o ^ ^ 



7)^-^. (40) 



The structure of expression ()40p is remarkably similar to that of expression 
fl22|) . which refers to the one output variable case. Furthermore, if the lower 
limits Lj = —00, V j = 1, . . . , n, i.e if 

ri = r2 = ■ ■ ■ = r„ = and Sn = N — n + 1, 

then one obtains the confidence level 

/AT\ 

/5=l-/(7,iV-n+l,n) = ^ ,U^(l-^f-^ (41) 

j=o V J / 

for one-sided tolerance region. 

In many practical cases it is sufficient to use one-sided tolerance regions 
(limited from above). If n = 2, i.e. if two mutually dependent output 
variables yi and 1/2 are tested, then from (PT|) one obtains 

/? = 1-7^- Ar(l- 7)7^-1 (42) 



which is exactly the same as (j24|) derived for the two-sided tolerance in- 
terval for one output variable. Here, it is worthwhile to cite two sentences 
from "There are several ways to interpret even a simple mathematical 
formula. The problem under consideration decides which interpretation we 
need. Notwithstanding, we should carefully prove the appropriateness of the 
interpretation chosen. " 

Perhaps, it is not superfluous to show how to determine the two-dimen- 
sional one-sided tolerance region for output variables yi and 1/2. First, cal- 
culate from (j42j) the number of observations N needed for the preassigned 
safety level (/?|7). Secondly, create the the sample 



yu,yi2, ■■■,yiN 

1/21,1/22, ■■■,y2N 



43 



and arrange the elements of the first row in increasing order. We obtain the 
matrix 

i/i(2), y,{N) \ 

and choose the element yi{N) as upper limit for yi, i.e. Ui = yi{N). Thirdly, 
search the largest element in the series ?/2ii, Z/2i2; ■ ■ ■ ) ?/2i^_i that gives the 
upper limit for y2, i.e. U2 = maxi<j<N-iy2ij, and finally, construct the 
region [—00, Ui] x [—00, U2] which is tolerance region of variables {yi and 1/2)- 
Clearly, if Ui < U^^^ and U2 < then we can state that the operation of 
the system is safe on the level for the jointly tested two variables (yi 

and 1/2). 

In order to compare the number of runs needed to determine two-sided 
tolerance regions at a given {f3\'~f) level for n ~ 1,2, and 3 mutually depen- 
dent output variables with unknown distributions, we compiled Table Xll. In 
order to achieve the usual safety level (0.95|0.95) we need N = 153 observa- 
tions in the case of two and N — 207 observations in the case of three output 
variables. The number of observations (runs) needed to meet stringent re- 
quirement, e.g. with three output variables the level (7 = 0.98 1/9 = 0.98) 
we need = 598 runs. Therefore it seems to be inevitable to seek methods 
with lower computational demands. 

Table XII. Number of runs needed to determine the two-sided tolerance region for 
n= 1, 2, 3 output variables at listed 7, f3 values. 



/3\7 


0.95 


0.96 


0.97 


0.98 


0.99 


n 




93 


117 


156 


235 


473 


1 


0.95 


153 


191 


256 


385 


773 


2 




207 


260 


348 


523 


1049 


3 




98 


123 


165 


249 


499 


1 


0.96 


159 


200 


267 


402 


806 


2 




215 


269 


360 


542 


1086 


3 




105 


132 


176 


266 


533 


1 


0.97 


167 


210 


281 


422 


848 


2 




224 


281 


376 


565 


1134 


3 




114 


143 


192 


289 


581 


1 


0.98 


179 


224 


300 


451 


905 


2 




237 


297 


397 


598 


1199 


3 




130 


163 


218 


329 


661 


1 


0.99 


197 


248 


331 


499 


1001 


2 




258 


324 


433 


651 


1307 


3 
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In order to provide some insight, let us consider the following example. 
We have two output variables yi and 1/2, their the joint distribution function 
is known: 



(2/1, 1/2) 



1 



27rVl - 



exp 



1 



2(1 -C2) 



{yl - 2Cyiy2 + yj] 



(43) 



where \C\ < 1 is the correlation coefficient of variables yi and y2- We are 
interested in the relationship between the significance level f3 and probability 
content of a given two dimensional region [L, U] = [Li, Ui] x [L2, U2] at the 
number of runs = 50(50)200. 



Table XIII. Levels of significance /3 of two-sided tolerance regions for two output 

variables at listed 7 and N values. 



iV\7 


0.95 


0.96 


0.97 


0.98 






0.8831 


0.7547 


0.5351 


0.2376 


C = 0.1 


50 


0.9433 


0.8775 


0.7442 


0.4970 


C = 0.9 




0.2396 


0.1391 


0.0628 


0.0178 


DF 




0.9109 


0.8836 


0.6297 


0.2121 


C = 0.1 


100 


0.9911 


0.9590 


0.8488 


0.4970 


C = 0.9 




0.7422 


0.5705 


0.3528 


0.1410 


DF 




0.9933 


0.9443 


0.6871 


0.1894 


C = 0.1 


150 


0.9981 


0.9869 


0.9044 


0.5554 


a = 0.9 




0.9452 


0.8542 


0.6616 


0.3528 


DF 




0.9986 


0.9683 


0.7380 


0.1612 


C = 0.1 


200 


0.9998 


0.9955 


0.9414 


0.5779 


C = 0.9 




0.9910 


0.9605 


0.8528 


0.5685 


DF 



Now we can proceed in two ways. The first way is to fix the fraction of the 
samples to fall into the given interval [L, U] , and to determine the associated 
probability f5, from Eq. (j^T|l . these numbers are in row DF (referring to 
Distribution Free). The second way is to use the known joint distribution 
function, calculate the estimates of variances (jj for z = 1, 2 from runs 
and define the interval [Lj, f/j] = [— 2.5aj, -|-2.5aj]. From 10^ random cases 
we estimated the /3 value, see Table XIV. These values are given for two 
correlation coefficient in rows C = 0.1 and C = 0.9. 

As we see in Table XIII. the order statistics gives lower /3 values, in 
most of the cases, compared to those obtained by using the density function 
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N=100 
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0.95 0.955 0.96 0.965 0.97 0.975 0.98 
probability content 



Figure 14: Two output variables. 



g{yi,y2, C)- This indicates a considerable gain from a known distribution 
function of output variables. 

In order to visualize the dependence of the confidence level (3 on proba- 
bility content 7 < three curves are shown in Fig. 14 when = 100. The 
two upper curves correspond to the known bivariate normal distribution of yi 
and 1/2 with A = 2.5, while the curve denoted by DF refers to the distribution 
free case. 



5 Safety Inference 

The purpose of performing safety analysis is to assure that the designed 
equipment can be operated safely. It is a self- understanding premise that 
by altering input data randomly within their prescribed distribution all the 
states will be either safe or unsafe. If both safe and unsafe states would 
occur, the entire range under consideration should be regarded as unsafe. 

-"^^ Without going into details, we mention only there exists a critical value jcriC) such 
that for 7 > 7cr(C) for all A^i < N2 we have P{^,Ni) > l3{j,N2). The critical value 
7cr(C') is defined by the integral J^^ S^\9{yiiy2, C) dyi dy2, similarly to that proved in 
one-dimensional case. See sub-subsection 3.4.2! The decrease of /3 with increasing TV can 
be so large for N > 100 that the /3 becomes smaller than the value obtained from the order 
statistics. This is the case for some (3 values with C = 0.1 in Table XIII. when N > 100 
and 7 > 7cr(0.1) = 0.9753. 
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Our approach has severe consequences on every statement concerning 
safety. The present section assesses those consequences. The first conse- 
quence is that we can not speak of safety of a given state, rather we can 
speak of the probabihty of a given state to be safe. Assume for the sake of 
simphcity that the model is not chaotic around the nominal state (a;o,^o) 
where — Cxq. The input variable (s) may take values in a given range, 
that range is mapped into a range of the output variables. From some other 
considerations, which thus far have not been regarded as part of safety anal- 
ysis, we get information on the probability distribution of the input variable. 
And, we select a range into which a large portion, say more than 90%, of the 
possible area lies with a given high, say 95%, probability. Consequently, we 
conclude that: 

1. It is insufficient to show that the nominal state is safe because there may 
be probable inputs, which are unsafe. Therefore, when the calculations 
are carried out exclusively in the nominal state, safety analysis should 
demonstrate the estimated error to be realistic. 

2. Another possibility is that safety analysis should show that images of 
all X points in the vicinity of xq are safe. In this case we get rid of the 
uncertainty caused by input uncertainty. 

3. Assumptions or knowledge of probability distributions of the input do 
have an impact on safety issues, therefore they must not be treated 
separately. Here two problems occur. Engineering input data are usu- 
ally not accompanied by probability distributions and input variables, 
which are actually internal in the given calculational model, may influ- 
ence the output error decisively. Such internal inputs are usually ob- 
tained from a fitting but that procedure usually gives no information 
on the probability distribution of fitted parameters (although theory 
and technique are known). 

4. Even if every Cx is safe for a given interval, there is a slight chance that 
some input (s) may be associated with unsafe output (s). Those chances 
can be read out from Table XI. for normally distributed output and 
from Tables IX. and X. for a single arbitrarily distributed output vari- 
able, as well as from Table XII. for two and three arbitrarily distributed 
output variables. 
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5. Safety is described by random variables therefore we can make only 
statistical assertions. Any claim concerning safety is associated with 

7) and the assumptions on the probability distribution(s) of the 
input variables. An alarming example is given in sub-subsection 3.2.3 
where we see that 22.4% of the rejected calculations result in larger 
maxima than in the basic sample. 

6. Biased probabiUty density functions seem to be extremely dangerous. 
The simple problem of determining a quantile (see Fig. 4) may lead to 
large differences. Based on the presented examples it seems desirable to 
treat certain class of distributions of the output variables with special 
care. 

7. Safety analysis should make it clear that every output interval lies in- 
side the safety envelope. The safety is not unconditional but the values 
/? and 7 characterize that "level" of safety. When any of inequalities 
yk{N) > k = 1, . . . ,n would be observed, then the system opera- 
tion could hardly be declared safe. This clearly indicates that safety is 
not deterministic, as treated by many, but random. 

Since the general consideration results in loss of a large amount of infor- 
mation, efforts should be made to chose a reasonable test for the estimation 
of the probability distribution of the output variable(s). To this end specific 
safety analysis models should be analyzed individually. All these caveats 
necessitate a reconsideration of safety issues. 

6 Conclusions 

The object of our investigation has been a complex system (e.g. a computer 
code) that we treated as a black box: Prom a well-defined input set the system 
(code) produces a well-defined output set. Both sets have metrics; we can 
speak of distance between two input sets or between two output sets. The 
computer code simulating the complex system is a map C : X ^ y, where X 
is the input set and y is the output set. When analyzing given equipment, 
we have a nominal input xq but the actual input might as well be anywhere in 
hence the input is a random vector. The probability distribution of input 
components is usually derived from diverse engineering considerations. In 
that setting, we have to predict the statistical behavior of the output vector, 
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and, we have to specify a safety envelope into which the actual output falls 
with high probability. The exact statements are formulated as theorems 
in Sections 3 and 4, while the conclusions are summarized as follows. 

1. The nominal state (^xo,Cxo^ is determined from the expectation value 

of the input and from the associated output. The investigation of the 
nominal state alone is insufficient to declare the system operation to 
be safe. 

2. When the distribution of output is not known, four methods, namely 
the Bayesian, the percentile and the sign test as well as the tolerance 
interval methods are proposed for testing the output data. The statis- 
tical statements which can be obtained by these methods do not differ 
significantly from one another. As expected, since the distribution is 
unknown, only a fraction of the information present in the output can 
be utilized. As a result, more runs are needed or lower probabilities 
are achieved. 

3. When the output is of normal distribution. Theorem 3 determines an 
interval [L, U] around the estimated mean value of the output into 
which a larger than a prescribed fraction 7 of the distribution falls 
with preassigned probability (3. The limits L and U are determined by 
the sample estimate of the standard deviation and by a positive factor 
A. For the mostly used N,i3,'-f values the A factors are given in Table 
XI. Our results are in accordance with Ref. [12. and fH] . 

4. When the output consists of more then one statistically not indepen- 
dent quantities, the portion of information content that can be utilized 
rapidly decreases with the number of simultaneously tested output vari- 
ables. That manifests again in a larger number of runs or in lower 
probabilities. This is true for both the sign test and tolerance inter- 
val methods. The results are given in Tables IX. and XII. It is worth 
noting that our results comply with the results given in Ref. ^2] only 
when the output variables are independent. To achieve identical (/3,7) 
level for statistically dependent outputs we need a larger number of 
runs than given in Ref. fT^ . 

^"^The present work was initiated by a remark stating that a limited number of runs 
suffices to determine the safety envelop. 



49 



All these observations may influence, in safety analysis, the application of 
best estimate methods, and underline the opinion that any realistic modeling 
and simulation of complex systems must include the probabilistic features of 
the system and the environment. 

I Appendix. Proof of Theorem 1. 

Obvious, if G{y) is continuous strictly increasing function of y, then 



Introducing the notations A — {z{s) < 7} and A — {z{s) > 7}, we can write 



nvir) <Qy< vis)} = V{y{r) < ^-^(7) < y{s)} 



V{z{r)<^<z{s),} 



which is nothing else than 



V{z{r) < 7 < z{s), } = P{z{r) < 7, z{s) > 7, }• 



that 



V{z{r) < 7} = V{z{r) <-f,A + A} = 
V{z{r-) < 7, z{s) < 7} + V{z{r) < 7, z{s) >> 7}, 



and from this we obtain 



V{y{r) <Q^< y{s)} = V{z{r) < 7, z{s) > 7} 



= V{z{r) < 7} - V{z{r) < 7, z{s) < 7}. 
By using the well known expression 



(I-a) 



V{u < z{r) <u + du, V < z{s) < v + dv} — gr,s{u, v) du dv 



B{r,s-r) B{s,N -s + 1) 
0<u<v <1, 



du dv, 



(I-b) 



we obtain that 




(I-c) 



50 



The first integral: 

wliere 
hence 



9r{u) 



gr,siu, v) du dv = / griu) du, 
Jo 

U^-l (1 _ y)A^-r 



B{r,N -r + 1)' 



Yi = I{'y,r,N - r + 1). 
Taking into account that v > u the second integral is nothing else than 

n n 




gr,s{u, v) du dv 



Jo 



dv / uJ'~^ {v - uy-''-^ (1 - v)^-' du, 
Jo 



where 



1 



B{r;s-r) B{s,N - s + iy 
By performing the transformations 

u = tit2 and V = t2 

the integral Y2 can be easily calculated. Since 



J 



du 


du 


9*1 


dt2 


dv 


dv 




at2 



we find that 

Y2 = Cr,s I t2dt2j {ht2r-Hh-ht2r-''-' {i-t2r'' dh 

'0 Jo 

f\\~^ (l-ti)^-'--' dh 
Jo 

Replacing Cr^s by p-c|) one obtains that 

Y2 = I{i,s,N -s + l). 



n 
Jo 



2)^ ^ dt2. 
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By using the well known identity J(c, a,b) = 1 — /(I — c, b, a), finally we have 

P = V{y{r) <Q,< vis)} = Y,-Y,= 

= J(l - 7, AT - s + 1, s) - /(I - 7, iV - r + 1, r), 
and so the Theorem 1 is proven. 

II Appendix. Proof of Theorem 2. 

The derivation of Eq. (22) is based on the following observation. Let 

G{y) = r g{t) dt 



be the unknown but continuous cumulative distribution function of the out- 
put variable y. Let yi, . . . ,yN he its independently observed values. Arrange 
the sample elements yk, k = 1, . . . , N in increasing order, and denote by 
y{k) the kth element of the ordered sample. Introduce the random variables 
z{k) = G[y{k)], k = 1, . . . , N which are are not independent [U]. According 
to p-b|) the bivariate density function of z{s) and z{r), r < s is given by 

(N) / \ r-l / \r-s+l N-s 

^(-) [r-l)\[s-r + l)\[N-s)\ " ' 

<u<v <l. 
In order to determine the probability 



V \ / dG{y) > 7 

Jv(r) 



'y(r) 

= V {G[y{s)] - G[y{T)] > 7} = P {z{s) - z{t)] > 7} , 
we need the probability 

V{t<z{s)- z{r) <t + dt}= wf^j) (t) dt= g[^J^ {u, u + t)du dt. 

J 

(Il-a) 
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Substituting gi^\u,v) into pi-a|) . the integration in pi-a|) can be carried 
out: 

1 







{r,s)K J B{r,s-r) B{s,N -s + 1) Jo 

(Il-b) 

where B{j, k) is the Euler beta function. Taking this expression into account, 
we get 

V {z{s) - z{r) > ^} = 



after integration we obtain 

B{s -r,N-s + r + l) 



V{z{s)-z{r)>^} = l- -^—^—-l^-^dt. 







In other words, 

/5 = l-/(7,s-r,Ar-s + r + l) (II-c) 

as stated. Q.E.D. 

Ill Appendix. Proof of Theorem 3. 

The proof of the Theorem 3 is based on a few well known relations of math- 
ematical statistics. By the definition of conditional probability, 

/ + 00 
V{p{z, S) > 7|5 = /i} dV{z < 12}, (Ill-a) 
-00 

where 



Since 

V{p{z, s) >^\z = fi} =V {p(/i, s) > 7} , 

we have 

/ j\T r+oo 

V{p{z, S) > 7} = y ^ y V {p(/i, S) > 7} e-^'^^/^ d^^. (Ill-b) 
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where 



' /I— As 

is random variable. Let us define the function 



1 



rfi+Xs 

r(u, As) = ^= / e~^^^'^ dz 



' fM — Xs 

for real s. If fi and A are fixed, then r(/i, As) is strictly monotonously in- 
creasing function of s, therefore the equation 

"fj,+Xs 
' fi—Xs 

has only one root in s. It is clear that As is independent of A, hence we may 
write As = q{fi,'y) and q is obtained from 

' fi-q 

It follows from the property of r(/i, As) that the probability of p{fi, s) > 7 
equals to the probability of s > s^, i.e. 

V {pifl, ~s)>^}=V{~S>Sr}=vl~S> ] . (III-C) 



/27r 



A 



By using this relations we can write that 



•p <; 5 > ^ilhJlX = r (52 > = v[-> 



A J • ' A2 J • I a2 A2 
and taking into account that the random variable 

7^2 N -\2 



n=0 

is of distribution with (A^ — 1) degree of freedom jH], we get 

A 

where 



V {pifi, s~) > 7} = 1 - K^-i {{N-1) ll^v^lLlZL I. , (iiLd) 



2r(iV/2) io 

Substituting pil-dj) into pil-bj) we get the theorem proven. Q.E.D. 
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IV Appendix. Proof of Theorem 4. 

Before setting out for the proof of Theorem 4, we set forth the following 
notation. Let 

H{X,^\^^) = V{p{^Irs)>l}. 

We need 



Lemma 1 It can be shown that 

H{X,^\1/Vn) - W{X,-f,N) = 0{N-'^). 

where 



(IV-a) 



/ AT r+oo 

W{X,^,N) = ^— j H{X,j\fi)e~'^'/'d^i. (IV-b) 



Proof of Lemma 1. The expression 



H{x,-f\^,) = v 



d 



z > ^ 



(IV-c) 



'2tT J^-Xs 

is an even function of fi and can be developed into Taylor series around /i = 



as 



5'"^(A,7l/i) 



2n 



H{X,^\f,) = J2 

n=0 

Substituting pV-d|) into pV-c|) we obtain 

-d'-HiXnlfi) 



2n 



=0 (2n)!' 



(IV-d) 



ra=0 



2n 



H{X,j\0) + 



dfJL 



d'H{X,^\f,) 



J /i=0 



(2n-l)!! 1 

(2n)\ AT" 



-^ + 0(iV-V2), 



2N 



(IV-e) 



and replacing /i by 1 / in pV-d|l , we have 

'^HiXrM 



if(A,7|l/ViV) = if(A,7|0) + 



fj.=0 



2N 



0{N 



-l/2^ 



(IV-f) 
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From Eqs. pV-t]) and pV-e|) follows that pV-a|) is true. Consequently , we 
have the following approximate equation 



i/(A,7|l/ViV) = 1-Kn-i 



A2 



where the argument of Kn^i[- ■ ■] is nothing else than the (1 — /3) percentile 
of distribution with (A^ — 1) degree of freedom. Introducing the notation 



A2 



we find that 



N-1 



QN-iil-/3) 
This completes the proof of Theorem 2. Q.E.D 



g(i/viv,7). 



(IV-g) 



V Appendix. Proof of Theorem 5. 

The proof of Theorem 5 is given in two steps. In the first step we show that 
the Theorem holds for n = 2, and then we generalize the claim for n > 2. 

Step 1. We assume that the unknown joint distribution function of two 
output variables yi and y2 is given by 

/yi ry2 
/ g(ti,t2) dti dt2, 
oo J — oo 

and denote by 

r+oo 

9i{yi) = / givi^h) dt2 



J — oo 

the density function of the output variable y\. Let us consider the following 
random variable 

^IJ\ l'U2 

A2 = A2iLi,Ui, 12,1/2) = / g{yi,y2) dyidy2, (V-a) 

JLi J L2 

where the boundaries of the integration are random variables. The limits 
were discussed in Section 4.2.2. A2 can be expressed almost surely as 

^2(^1, f/i, L2, U2) = C2{L2, U2\L,, f/i) Ai{L^, f/i). (V-b) 
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Here 



AiiL,,U,)= [ \i{yi)dy, (V-c) 

J Lx 

and 

C2(L2,t/2|Li,t/i) = / <p2{y2\LuU{) dy2. (V-d) 

where 

My2\Li, Ui) = — — -^j— — - = (V-e) 

Loo dy2 Jl, 9iyi,y2) dyi Ai[Li,Ui) 

is the random density of variable y2 under the condition that yi lies in [Li, Ui]. 
Since Ai{Li,Ui) = Gi[yi{si)]—Gi[yi{ri)], using relation (IV-b), we find that 

,si-ri-l _ , \N-si+ri 

V {h < Ai{L,, f/i) <h + dh} = — i \ dh = 

B{si - ri, N - si + ri + 1) 

= k^^^l^^ih) dt,. (V-f) 

To obtain the density function of C2{L2,U2\Li,Ui), we define the random 
probability measure 

G{t\L,,U,)= I My2\Li,Ui) dy2, 

J — oo 

with which we can express C2 as 

C2(L2, f/2|Li, U,) = G[y2{s2)\L,, U,] - G[y2ir2)\L^, U^], 
where ri < r2 < • ■ ■ < S2 < si. Finally we get 

V {t2 < €2(12, U2\Li, Ui) <t2 + dt2} = 



B{s2 - r2, Si - ri - S2 + rg"! ""^^ "'(^2,^2) 



dt2 = irrl2r'ih)dt2. (V-g) 



Note that expression ( V-g ) contains neither Li nor Ui, therefore, distri- 
bution of random variable C2 is independent of Li and Ui. Consequently, 
the joint density distribution of Ai and C2 is the product of ()V-f|) and ( |V-g| ). 
We still need the density function of the random variable A2- Exploiting the 
independence of C2 and Ai, we get 
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V{t< A(Li, Ui, L2, U2)<t + dt} 



' I 4s::ir'^(v^) dt = wj,M dt. (v-h) 



Substituting here (jV-f|) and ( |V-g ) and performing the indicated calculations 
we obtain: 

j.S2-T2-l f\N-S2+r2 

= m AT \ ^^-^) 

B{S2 - r2, N - S2 + r2 + 1) 
From this, immediately follows 

V{A2{L,,UuL2,U2)>i} = 

B{-f,S2-r2,N - S2 + r2 + l) , 
B(s,-r,,N-s, + r, + l) - ^ - Hi, S2 - r„ N - + n + 1). 

This completes Step 1. 

Step 2. Now we generalize the above result for n > 2. Let us assume that 
the unknown joint probability distribution of the output variables yi, . . . ,yn 
is given by 

■■■ g{vi, ...,Vn) dvi ■ --dvu- 

-oo J —infty 

Our task is to derive the probability distribution of the random variable 



rUn ru- 
Ap{Li,Ui, . . . , Ln,Un) = / / 

J Ln J Ll 



giVi, ■■■,yn) dyi ■ --dyn, 



which is an n-fold integral over the n output variables. We introduce an 
intermediate term, in which an i-fold definite integral over the first i variables 
is involved, and the rest of the variables are integrated over the [— oo, +oo] 
range: 

•Ai {Ll, Ui, . . . , Li, Ui) = 
dyn--- 1 dy^+i / dyi--- dyi g{yi, ...,yn), 



and 



Ui-i rUi 



{yi\Li,Ui,. . . ,Li_i,Ui-i) = ^— I dyi_i--- I dyi g{yi, . . . ,yn), 
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which is the random density of the variable Ui under the condition that 
Lj < Vj < Uj, j = 1, . . . , i — 1. As we did in (IV- d|) . we introduce a random 
probabihty measure associated with the condition that the first (^ — 1) output 
variables lie in the interval assigned to them by [Lj, Uj], j = 1, . . . ,i — 1: 



Ci = Ci {Li, Ui\Li, Ui,..., Li_i, Ui_i) = (pi {yi\Li, Ui, . . . , Li_i, Ui_i) dyi 

J Li 



-Ui 

The above defined Ais obey the recursion 

A+i = d+i Ai. (V-j) 

Lemma 2 The probability of finding Ai in the interval + dti] is given 
by 

V {t, <A,<t, + dt,} = -f \- ^ — - dti. (V-k) 

B{si - ri,N - Si + ri + 1) 

Proof of Lemma 2. Eq. ()V-k|) is certainly true for z = 1, 2 because 
Ai = AqCi=Ci= I g{y) dy 

JLi 

and 

A2 = C2 Ai = C2{L2,U2\Li,Ui) Ai{Li,Ui) = / g{yi,y2) dyi dy2. 

Now we assume that ()V-k|) is true for i = j and show that it is true also for 
i = j + 1. Note that Aj and Cj+i are statistically independent because 



V {tj+i < Cj+i < tj+i + dtj+i} 



^ dti- 



B[Sj+i — Sj — Tj — Sj+i + rjj^i) 

does not involve the quantities Lj, Uj, . . . , Li, Ui, which occur in Aj. The 
joint density function of Aj and Cj+i takes the form of the joint density 
function of Ai and C2 in the case of n = 2. Hence the density function of 

Aj Cj+i '^i+i 
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is obtainable from Eq. ()V-i|) by substituting r^+i for r2 and for S2, i.e. 

V {tj+i < Aj+i < tj+i + dtj+i} = 



B{sj+i - r^+i, N - Sj+i + Tj+i + 1) 

Hence, Eq. flVT|l is proven for i = 1,2, This completes Step 2. 

Furthermore, the density function of An is given by 

V{t<A.<t^M}^ ... (0 it ^ ^^^^ _ 'It. 

It is interesting to note that the density function of An does not depend on 
the integers ri, si, . . . , r„_i, s„_i. This completes the proof of Theorem 4. 
Q.E.D. 



References 

[1] A. Gandini: Uncertainty Analysis and Experimental Data Transposi- 
tions Methods Based on Perturbation Theory, in: Y Ron en (Ed.): Hand- 
book of Uncertainty Analysis, CRC Press, Boca Raton (FL), 1988 

[2] A. Cuba, M. Makai and L. Pal: Reliab. Engng. Sys. Safety, 80, 217 
(2003) 

[3] M. Kendall and A. Stuart, The Advenced Theory of Statistics, vol. 2, 
London, Charles Griffin, 1979 

[4] M.J. Burwell et ai: The Thermohydraulic Code ATHLET for Anal- 
ysis of PWR and BWT Systems, NURETH-4, Karlsruhe, 1989 ; H. 
Austregesilio, H. Dellenbeck: ATHLET Mod 12 Cycle A, Programmers 
Manual, vol. 1., CRS, March 1998 

[5] H. C. Claeser et al.: Uncertainty Analysis of a Post-Experiment Calcu- 
lation in Thermal Hydraulics, Reliability Engineering and System Safety, 
45, 19 (1994) 

[6] L. Pal: Fundamentals of Probability Theory and Statistics, vol. LII, 
Akademiai Kiado, Budapest, 1995, in Hungarian 



60 



[7] C.J. Clopper and E.S. Pearson: Biometrica, 26, 404 (1934) 

[8] S.S. Wzlks: Annals of Math. Stat. 12, 91 (1941) and Annals of Math. 
Stat. 13, 400 (1942) 

[9] A. Wald: Annals of Math. Stat. I4, 45 (1943) and Annals of Math. 
Stat. 17, 208 (1946) 

[10] H. Robbins: Annals of Math. Stat. 15, 214 (1944) 

[11] R.R. Odeh and D.B. Owen: Tables for Normal Tolerance Limits. Sam- 
pling Plans, and Screening, Marcel Dekker , New York, (1980) 

[12] B. Krzykacz, EQUUS - A Computer Program for the Derivation of Em- 
pirical Uncertainty Statements of Results from Large Computer Models, 
GRS-A-1720, 1990 

[13] M. Makai and L. Pal: Reliab. Engng. Sys. Safety, 80, 313 (2003) 



61 



